Search CORE

47 research outputs found

Exposing the limits of zero-shot cross-lingual hate speech detection

Author: Nozza Debora
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

Reducing and counter-acting hate speech on Social Media is a significant concern. Most of the proposed automatic methods are conducted exclusively on English and very few consistently labeled, non-English resources have been proposed. Learning to detect hate speech on English and transferring to unseen languages seems an immediate solution. This work is the first to shed light on the limits of this zero-shot, cross-lingual transfer learning framework for hate speech detection. We use benchmark data sets in English, Italian, and Spanish to detect hate speech towards immigrants and women. Investigating post-hoc explanations of the model, we discover that non-hateful, language-specific taboo interjections are misinterpreted as signals of hate speech. Our findings demonstrate that zero-shot, cross-lingual models cannot be used as they are, but need to be carefully designed

Archivio istituzionale della Ricerca - Bocconi

Open Access Repository

Measuring Harmful Representations in Scandinavian Language Models

Author: Nozza Debora
Touileb Samia
Publication venue
Publication date: 21/11/2022
Field of study

Scandinavian countries are perceived as role-models when it comes to gender equality. With the advent of pre-trained language models and their widespread usage, we investigate to what extent gender-based harmful and toxic content exist in selected Scandinavian language models. We examine nine models, covering Danish, Swedish, and Norwegian, by manually creating template-based sentences and probing the models for completion. We evaluate the completions using two methods for measuring harmful and toxic completions and provide a thorough analysis of the results. We show that Scandinavian pre-trained language models contain harmful and gender-based stereotypes with similar values across all languages. This finding goes against the general expectations related to gender equality in Scandinavian countries and shows the possible problematic outcomes of using such models in real-world settings.Comment: Accepted at the 5th workshop on Natural Language Processing and Computational Social Science (NLP+CSS) at EMNLP 2022 in Abu Dhabi, Dec 7 202

arXiv.org e-Print Archive

Language invariant properties in Natural Language Processing

Author: Bianchi Federico
Hovy Dirk
Nozza Debora
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

Archivio istituzionale della Ricerca - Bocconi

XLM-EMO: multilingual emotion prediction in social media text

Author: Bianchi Federico
Hovy Dirk
Nozza Debora
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

Archivio istituzionale della Ricerca - Bocconi

Overview of the Evalita 2018 task on Automatic Misogyny Identification (AMI)

Author: Fersini Elisabetta
Nozza Debora
Rosso Paolo
Publication venue: 'OpenEdition'
Publication date: 01/01/2018
Field of study

Automatic Misogyny Identification (AMI) is a new shared task proposed for the first time at the Evalita 2018 evaluation campaign. The AMI challenge, based on both Italian and English tweets, is distinguished into two subtasks, i.e. Subtask A on misogyny identification and Subtask B about misogynistic behaviour categorization and target classification. Regarding the Italian language, we have received a total of 13 runs for Subtask A and 11 runs for Subtask B. Concerning the English language, we received 26 submissions for Subtask A and 23 runs for Subtask B. The participating systems have been distinguished according to the language, counting 6 teams for Italian and 10 teams for English. We present here an overview of the AMI shared task, the datasets, the evaluation methodology, the results obtained by the participants and a discussion of the methodology adopted by the teams. Finally, we draw some conclusions and discuss future work.Automatic Misogyny Identification (AMI) è un nuovo shared task proposto per la prima volta nella campagna di valutazione Evalita 2018. La sfida AMI, basata su tweet italiani e inglesi, si distingue in due sottotask ossia Subtask A relativo al riconoscimento della misoginia e Subtask B relativo alla categorizzazione di espressioni misogine e alla classificazione del soggetto target. Per quanto riguarda la lingua italiana, sono stati ricevuti un totale di 13 run per il Subtask A e 11 run per il Subtask B. Per quanto riguarda la lingua inglese, sono stati ricevuti 26 run per il Subtask A e 23 per Subtask B. I sistemi partecipanti sono stati distinti in base alla lingua, raccogliendo un totale di 6 team partecipanti per l’italiano e 10 team per l’inglese. Presentiamo di seguito una sintesi dello shared task AMI, i dataset, la metodologia di valutazione, i risultati ottenuti dai partecipanti e una discussione sulle metodologie adottate dai diversi team. Infine, vengono discusse conclusioni e delineati gli sviluppi futuri

Archivio istituzionale della Ricerca - Bocconi

Crossref

OpenEdition